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Abstract 

Robot  control  in  nonlinear  and  nonstationary  run-time 
environments  presents  challenges  to  traditional  software 
methodologies.  In  particular,  robot  systems  in  “open  ”  do¬ 
mains  can  only  be  modeled  probabilistically  and  must  rely 
on  run-time  feedback  to  detect  whether  hardware/software 
configurations  are  adequate.  Modifications  must  be  ef¬ 
fected  while  guaranteeing  critical  performance  properties. 
Moreover,  in  multi-robot  systems,  there  are  typically  many 
ways  in  which  to  compensate  for  inadequate  performance. 
The  computational  complexity  of  high  dimensional  senso¬ 
rimotor  systems  prohibits  the  use  of  many  traditional  cen¬ 
tralized  methodologies. 

We  present  an  application  in  which  a  redundant  sensor 
array,  distributed  spatially  over  an  office-like  environment 
can  be  used  to  track  and  localize  a  human  being  while 
reacting  at  run-time  to  various  kinds  of  faults,  including: 
hardware  failure,  inadequate  sensor  geometries,  occlusion, 
and  bandwidth  limitations.  Responding  at  run-time  re¬ 
quires  a  combination  of  knowledge  regarding  the  physical 
sensorimotor  device,  its  use  in  coordinated  sensing  oper¬ 
ations,  and  high-level  process  descriptions.  We  present  a 
distributed  control  architecture  in  which  run-time  behavior 
is  both  preanalyzed  and  recovered  empirically  to  inform  lo¬ 
cal  scheduling  agents  that  commit  resources  autonomously 
subject  to  process  control  specifications.  Examples  will  be 
available  from  our  search  and  rescue  platform l. 

1  Introduction 

High-level  deliberation  and  low-level  reactivity  are 
valuable  in  the  control  of  autonomous  and  self-adaptive 
systems.  A  successful  implementation  of  such  a  hybrid 
architecture  would  permit  the  system  to  make  use  of  prior 

*  This  work  was  supported  by  AFRL/IFTD  under  F30602-97-2-0032 
(SAFER),  DARPA/ITO  DABT63-99- 1-0022  (SDR  Multi-Robot),  and 
NSF  CDA-9703217  (Infrastructure). 
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knowledge  when  appropriate  and  to  respond  quickly  to  run¬ 
time  data.  The  central  open  question  appears  to  be  decid¬ 
ing  how  reacting  and  deliberating  should  interact  in  a  con¬ 
structive  fashion.  We  have  adopted  a  perspective  in  which 
the  control  hierarchy  is  adaptive  at  every  level.  Low-level 
control  processes  parameterized  by  resources  interact  with 
the  domain  continuously  and  recover  context  observable  by 
the  working  set  of  control.  This  kind  of  context  feedback 
can  be  used  to  identify  the  current  run-time  environment 
and  permits  the  high-level  process  planner  to  re-deploy  re¬ 
sources  so  as  to  address  the  goals  of  the  system  and  to 
change  the  kinds  of  context  feedback  available.  Over  time, 
robust  plans  for  interacting  with  specific  problem  domains 
are  compiled  these  policies  into  rich,  comprehensive  reac¬ 
tive  policies.  State  descriptions  evolve  to  express  likely 
run-time  context  at  the  highest  levels  and  reactive  policies 
adapt  to  handle  run-time  contingencies  at  the  lowest  levels. 

We  are  concentrating  on  how  sensory  and  computational 
resources,  distributed  in  a  non-uniform  manner  over  mul¬ 
tiple  mobile  platforms  can  be  coordinated  to  achieve  mis¬ 
sion  objectives.  Our  approach  relies  on  technologies  that 
produce  flexibility,  resourcefulness,  high  performance,  and 
fault  tolerance.  Specifically,  we  are  interested  in  (1)  how 
cross-modal  sensory  front-ends  can  be  designed  to  provide 
mission-specific  percepts,  (2)  how  perceptual  behavior  can 
incorporate  sensory  information  derived  from  two  or  more 
robotic  platforms  carrying  different  sensors  and  feature  ex¬ 
traction  algorithms,  and  (3)  how  team  resources  can  be  or¬ 
ganized  effectively  and  how  low-level  sensory  and  motor 
activity  can  be  scheduled  to  achieve  multiple  simultaneous 
objectives. 

A  family  of  resource  scheduling  policies,  called  Behav¬ 
ior  Programs  (B-Pgms),  is  downloaded  into  each  member 
of  a  working  group  of  robots  as  part  of  the  configuration 
process.  Each  B-Pgm  contains  a  set  of  (previously  evalu¬ 
ated)  contingency  plans  with  which  to  respond  to  a  vari¬ 
ety  of  likely  run-time  contexts.  This  policy  is  responsible 
for  orchestrating  the  run-time  behavior  of  the  system  in  re¬ 
sponse  to  percepts  gathered  on-line.  The  temporal  history 
of  states  produced  by  a  particular  B-Pgm  defines  a  run-time 
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context  and  supports  probabilistic  performance  predictions 
for  the  team.  These  predictions  are  continuously  refined 
and  updated  for  use  in  higher-level  planning.  Run-time 
contexts  that  have  may  be  handled  by  making  use  of  con¬ 
tingency  plans  in  the  B-Pgm,  by  re-deploying  resources,  or 
by  replanning  at  the  mission/process  planning  level  if  re¬ 
quired. 

The  UMass  hybrid  architecture  is  based  on  a  set  of  prim¬ 
itive,  closed-loop  control  processes.  This  framework  al¬ 
lows  hierarchical  composition  of  the  controllers  into  be¬ 
havior  programs  (B-Pgms)  for  tracking,  recognition,  mo¬ 
tion  control,  and  for  a  more  complex  human  tracking  sce¬ 
nario.  We  are  developing  schemes  for  automatically  pro¬ 
gramming  behavior  in  a  variety  of  meaningful  contexts  and 
subsequently  using  these  policies  as  abstract  actions  in  a 
growing  Semi-Markov  Decision  Process  (SMDP).  These 
hierarchically  organized  processes  are  implemented  in  a 
distributed,  real-time  environment  in  which  we  are  devel¬ 
oping  mechanisms  for  multi-threaded  behavior.  Moreover, 
the  multi-robot  platform  is  designed  to  respond  to  multiple, 
simultaneous  objectives  and  reasons  about  resources  using 
a  high-level  process  description  and  control  procedure  us¬ 
ing  the  little-JIL  process  description  language.  Our  goal  is 
an  ambitious,  vertically  integrated  software  environment  in 
which  run-time  data  sets  drive  the  organization  of  behavior 
and  contribute  to  the  management  of  large  and  comprehen¬ 
sive  software  systems.  This  document  describes  the  very 
first  experiments  employing  this  paradigm. 

2  Sensory  Primitives  for  Motion  Tracking 

A  multi-objective  system  requires  that  the  sensory  algo¬ 
rithms  are  flexible  to  support  adaptation  and  reconfigurable 
on-line  to  facilitate  fault- tolerance.  Our  approach  is  de¬ 
signed  to  provide  a  set  of  sensor  processing  techniques  that 
can  fulfill  both  low-level  and  high-level  objectives  in  an 
open  environment.  Cooperative  interaction  among  mem¬ 
bers  of  the  robot  team  requires  the  mission  planner  to  be 
effective  in  utilizing  system  resources  across  team  mem¬ 
bers,  including  robot  platforms,  sensors,  computation,  and 
communication.  In  particular,  we  are  constructing  virtual 
robot  behaviors  across  multiple  coordinated  platforms  and 
multiple  sensors.  To  achieve  the  desired  robustness,  our 
platform  is  configured  with  a  variety  of  sensors  and  algo¬ 
rithms.  Vision  is  the  primary  sensing  modality,  but  it  is 
complemented  by  inexpensive  pyroelectric  sensors,  sonar, 
infrared  proximity  sensors,  and  (in  the  future)  acoustic  sen¬ 
sors.  Multiple  types  of  sensors  are  considered  to  be  dis¬ 
tributed  across  multiple  robot  platforms  to  allow  flexibility 
in  mission  planning  and  resource  scheduling  in  response  to 
hardware  and  algorithm  failures. 


2.1  Panoramic  Imaging 

Effective  combinations  of  transduction  and  image  pro¬ 
cessing  is  essential  for  operating  in  an  unpredictable  envi¬ 
ronment  and  to  rapidly  focus  attention  on  important  activ¬ 
ities  in  the  environment.  A  limited  held-of-view  (as  with 
standard  optics)  often  causes  the  camera  resource  to  be 
blocked  when  multiple  targets  are  not  close  together  and 
panning  the  camera  to  multiple  targets  takes  time.  We  em¬ 
ploy  a  camera  with  a  panoramic  lens2  to  simultaneously  de¬ 
tect  and  track  multiple  moving  objects  in  a  full  360-degree 
view  [4,  10,  13], 


Figure  1.  Original  panoramic  image  (768  x  576) 


Figures  1,  2,  and  3  depict  the  processing  steps  involved 
in  detecting  and  tracking  multiple  moving  humans.  Fig¬ 
ure  1  shows  one  of  the  original  panoramic  images  from  a 
stationary  sensor.  Four  moving  objects  (people)  were  de¬ 
tected  in  real-time  while  moving  in  the  scene  in  an  un¬ 
constrained  manner.  A  background  image  is  generated  au¬ 
tomatically  by  tracking  dynamic  objects  through  multiple 
frames.  The  number  of  frames  needed  to  completely  build 
the  background  model  depends  on  the  number  of  moving 
objects  in  the  scene  and  their  motion.  The  four  moving  ob¬ 
jects  are  shown  as  an  un-warped  cylindrical  image  of  Fig¬ 
ure  2,  which  is  a  more  natural  panoramic  representation  for 
user  interpretation.  Each  of  the  four  people  were  extracted 
from  the  complex  cluttered  background  and  annotated  with 
a  bounding  rectangle,  a  direction,  and  an  estimated  distance 
based  on  scale  from  the  sensor.  The  system  tracks  each 
object  through  the  image  sequence  as  shown  in  Figure  3, 
even  in  the  presence  of  overlap  and  occlusion  between  two 
people.  The  dynamic  track  is  represented  as  an  elliptical 
head  and  body  for  the  last  30  frames  of  each  person  and 
the  final  position  on  the  image  plane  is  illustrated  in  Fig¬ 
ure  2.  The  human  subjects  reversed  directions,  overlapped, 

2PAL-3802  system,  manufactured  by  Optechnology  Co. 
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Figure  2.  Un-warped  image,  four  moving  people  detected 


Figure  3.  Track  through  image  sequence  for  the  last  32  frames 


and  occluded  on  another  during  this  sequence.  The  vision 
algorithms  can  detect  self-motion  of  the  robot,  change  in 
the  environment,  illumination,  and  sensor  failure,  while  re¬ 
freshing  the  background  accordingly.  The  detection  rate 
of  the  current  implementation  for  tracking  two  objects  is 
about  5Hz. 

The  motion  detection  algorithm  relies  heavily  on  the  ac¬ 
curacy  of  the  background  model  at  any  given  time  in  order 
to  detect  moving  objects.  Types  of  changes  in  the  back¬ 
ground  can  be  broadly  grouped  into  two  categories. 

•  changes  due  to  the  illumination  affecting  pixel  inten¬ 
sities  at  a  fine  scale;  and 

•  changes  of  surfaces  in  the  environment  such  as  the 
movement  of  objects. 

It  is  quite  difficult  to  take  care  of  both  cases  simultane¬ 
ously  because  the  first  type  requires  a  constant  update  while 
the  second  type  requires  a  context-dependant  update.  The 
low-level  background  estimation  procedure  is  quite  simple. 
The  constant  update  is  done  on  those  regions  of  the  image 
that  are  not  classified  as  a  moving  object  by  the  motion  de¬ 
tection  algorithm.  We  track  each  region  and  keep  a  history 
of  velocity  for  each  as  well.  When  the  velocity  falls  below 
a  threshold  and  remains  so  for  a  period  of  time,  it  becomes 
a  suitable  candidate  for  part  of  the  background.  The  as¬ 
sumption  is  made  that  humans  will  be  not  be  still  for  a  long 
period  of  time.  Therefore,  they  do  not  become  part  of  the 
background.  Similarly,  only  when  the  velocity  of  an  ob¬ 
ject  exceeds  a  threshold,  is  it  classified  as  a  possible  human 
subject.  This  helps  to  avoid  detecting  some  objects  that 
should  remain  part  of  background  but  are  not  completely 
stationary,  like  the  motion  of  tree  branches,  or  the  flicker  of 
a  computer  monitor. 

The  adaptive  background  update  improved  the  perfor¬ 
mance  of  the  panoramic  sensors  considerably.  The  above 


adaptation  only  provides  a  low-level  mechanism  to  handle 
the  problem  of  maintaining  an  accurate  background  model. 
A  more  elegant  way  would  be  to  use  the  context  as  inferred 
by  the  reasoning  at  higher  levels  of  knowledge-based  plan¬ 
ning  where  all  resources  available  might  be  employed.  For 
example,  an  unconscious  human  will  be  still,  so  the  low 
level  will  infer  this  as  the  background  appearance.  How¬ 
ever,  using  the  pyroelectric  sensor,  we  might  know  where 
the  human  is,  particularly  if  the  previous  motion  of  that 
body  had  been  detected.  This  information  could  be  passed 
to  the  vision  sensors  to  update  the  background  accordingly. 

2.2  Pyroelectric  sensor 
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Figure  4.  Pyroelectric  sensor. 

The  pyroelectric  sensor3  is  a  Lithium  Tantalate  pyro¬ 
electric  parallel  opposed  dual  element  high  gain  detector 
with  complete  integral  analog  signal  processing  [3].  The 
detector  is  tuned  to  thermal  radiation  in  the  range  that  is 
normally  emitted  by  humans.  Since  the  pyroelectric  de¬ 
tector  itself  only  responds  to  changes  in  heat,  the  detector 

3Model  442-3  IR-EYE  Integrated  Sensor,  manufactured  by  Eltec  In- 
struments 


3 


must  be  scanned.  As  shown  in  Figure  4,  a  thermal  target  is 
identified  as  a  zero  crossing  in  the  sensor’s  data  stream.  We 
have  implemented  such  a  sensor  on  a  scanning  servo  motor 
with  two  control  modes;  the  sensor  may  saccade  to  a  region 
of  space  designated  by  another  sensor  or  pair  of  sensors, 
and  it  can  track  the  thermal  signature  (even  when  the  sub¬ 
ject  is  still)  by  oscillating  around  the  target  heading.  The 
result  is  a  sensor  that  responds  quite  precisely  to  human 
body  temperature  but  with  a  rather  poor  lateral  bandwidth. 
This  is  due  primarily  to  the  scanning  required  to  measure  a 
zero  crossing.  To  use  this  sensor  appropriately,  it  must  be 
applied  only  when  the  predicted  lateral  bandwidth  of  the 
subject  is  relatively  small. 

2.3  Stereo  Head  System 

The  stereo  head  platform4  is  a  high-performance  binoc¬ 
ular  camera  platform  with  two  independent  vergence  axes. 
As  shown  in  Figure  12,  it  has  four  mechanical  degrees  of 
freedom  and  each  lens  has  three  optical  degrees  of  freedom 
[6]. 

There  are  several  state-of-the-art  tracking  algorithms  in 
the  literature  [1,  9,  2].  Our  tracking  algorithm  uses  one  of 
the  cameras  as  an  active  eye  and  the  other  as  an  passive  eye. 
The  active  eye  detects  subsampled  pixels  of  greatest  change 
in  intensity  between  two  consecutive  frames.  The  passive 
eye  correlates  multi-resolution  fovea  with  the  frame  from 
the  active  eye.  The  stereo  head  is  then  servoed  to  bring  the 
pixel  of  greatest  change  into  the  fovea  of  the  active  eye. 
Subsequently,  the  passive  eye  is  verged  to  point  its  fovea 
to  the  same  world  feature  as  the  fovea  of  the  active  eye, 
extracting  the  spatial  location  of  the  object. 

The  accuracy  of  the  spatial  location  of  the  object  is  de¬ 
pendent  on  its  distance  from  the  stereo  head  system.  This 
algorithm  can  only  track  single  moving  objects. 

2.4  SACCADE-FOVEATE  B-Pgm  for  Recover¬ 
ing  Heading 

The  most  primitive  software  process  in  this  approach 
is  an  asymptotically  stable  closed-loop  controller  [5,  8], 
Controllers  suppress  local  perturbations  by  virtue  of  their 
closed-loop  structure.  Some  variations  in  the  context  of 
a  control  task  are  simply  suppressed  by  the  action  of  the 
controller.  Controllers  also  provide  a  basis  for  abstraction. 
Instead  of  dealing  with  a  continuous  state  space,  a  behav¬ 
ioral  scheme  need  only  worry  about  control  activation  and 
convergence  events.  When  a  control  objective  is  met,  a 
predicate  is  asserted  in  an  abstract  model  of  the  system  be¬ 
havior.  The  pattern  of  boolean  predicates  over  a  working 
set  of  controllers  constitutes  a  functional  state  description 
in  which  policies  can  be  constructed.  The  “state”  of  the 
system  is  a  vector  of  such  functional  predicates,  each  el¬ 
ement  of  which  asserts  convergence  for  some  control  law 

4BiSight  System,  manufactured  by  HelpMate  Robotics,  Inc. 


and  resource  combination.  The  state  vector  also,  therefore, 
represents  the  set  of  discrete  subgoals  available  to  a  robot 
given  these  native  control  laws  and  resources. 

Two  closed-loop  primitives  are  employed  for  motion 
tracking  (see  Figure  5). 


Figure  5.  Closed-Loop  Primitives  for  Controlling  At¬ 
tention. 

The  first,  saccade,  accepts  a  reference  heading  in  space 
and  directs  the  sensor’s  field-of-view  to  that  heading.  The 
second,  foveate,  is  similar  except  that  it  accepts  heading 
references  determined  by  the  sensor’s  signal.  For  example, 
the  pyroelectric  sensor  scans  a  small  region  centered  on  the 
current  gaze  and  identifies  the  zero  crossing  in  the  sensor 
output.  The  heading  to  the  zero  crossing  is  used  as  the  ref¬ 
erence  heading  to  control  the  sensor’s  gaze.  Within  band¬ 
width  limitations,  the  result  is  that  the  pyroelectric  sensor 
tracks  the  moving  thermal  source. 

Localizing  and  tracking  the  motion  of  objects  in  the 
world  is  an  important,  reusable  behavior  that  can  be  real¬ 
ized  a  number  of  different  ways  using  a  variety  of  different 
sensors.  Each  sensor  in  a  stereo  pair  recovers  the  heading 
to  a  feature  in  the  environment.  When  the  imaging  geom¬ 
etry  of  the  pair  is  suitable,  the  sensors  can,  in  principle,  be 
used  to  triangulate  the  spatial  location  of  the  feature.  More¬ 
over,  the  control  process  for  each  sensor  can  be  completely 
independent  of  the  other  sensor  processes.  We  have  hand¬ 
crafted  a  B-Pgm  for  accomplishing  this  task  that  is  para¬ 
metric  in  sensory  resources.  This  B-Pgm  is  illustrated  in 
Figure  6  -  it  represents  a  family  of  run-time  hardware  con¬ 
figurations  for  estimating  the  location  of  moving  objects  in 
space. 


Figure  6.  Behavior  Program  for  Detecting  and  Mea¬ 
suring  the  Heading  to  a  Motion  Cue. 
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The  state  in  the  nodes  of  Figure  6  is  the  convergence  sta¬ 
tus  of  the  saccade  controller,  <f>8,  and  the  foveate  controller, 
<pf.  That  is,  if  (ps  is  converged  and  0/  is  not,  then  the  state 
of  the  saccade-foveate  process  is  10.  An  X  in  the  state 
representation  represents  a  “don’t  care”  or  “don’t  know” 
condition. 

The  saccade-foveate  B-Pgm  (or  template)  for  behavior 
relies  on  a  concurrent  “saccade-foveate”  approach.  This 
strategy  begins  by  directing  a  sensor  r i  €  R  to  saccade 
to  an  interesting  region  of  space.  If  this  process  fails  for 
some  reason,  it  is  presumably  an  error  in  the  motor  com¬ 
ponent  for  the  sensor  and  it  reports  a  fault.  If  no  hardware 
fault  is  detected  and  the  sensor  achieves  state  IX,  then  an 
independent,  periodic,  closed-loop  process  <f>f  is  engaged 
whose  goal  it  is  to  bring  the  centroid  of  the  local  motion 
cue  to  the  center  or  fovea  of  sensor  r  i ’s  image  plane.  If  no 
motion  cue  is  detected,  then  a  report  of  “no  target”  is  gen¬ 
erated.  If  a  target  motion  cue  is  detected  and  foveated,  then 
the  sensor  achieves  state  X 1  where  the  target  is  foveated, 
is  actively  tracked,  and  which  likely  is  no  longer  at  the  po¬ 
sition  specified  by  the  original  saccade.  As  long  as  sub¬ 
sequent  foveation  cycles  preserve  this  state,  a  heading  to 
the  motion  cue  is  reported.  If,  however,  the  sensor  state 
becomes  X0,  then  the  target  may  be  moving  too  quickly 
and  a  “target  lost”  report  is  generated.  When  two  sensors 
are  simultaneously  in  state  XI,  then  the  pair  of  active  B- 
Pgms  are  reporting  sufficient  information  for  triangulating 
the  spatial  location  of  this  motion  cue.  Under  these  cir¬ 
cumstances,  this  B-Pgm  produces  a  hypothesis  regarding 
the  location  of  a  motion  cue.  Each  unique  resource  allo¬ 
cation  ri ,  T2  €  R  produces  hypotheses  of  varying  quality 
depending  on  the  context  of  the  localization  query. 

This  policy  does  not  rely  on  specific  output  type.  In 
fact,  while  incorrect  correspondence  can  lead  to  anomalous 
results,  cross-modality  can  be  used  to  advantage.  For  ex¬ 
ample,  if  the  location  is  computed  from  consistent  visual 
motion  and  pyroelectric  information,  then  we  may  detect 
“warm-moving”  bodies.  Such  a  strategy  may  be  attractive 
when  detecting  and  localizing  human  beings  as  opposed  to 
other  types  of  moving  objects. 

2.5  “Virtual”  Stereo  Pairs 

Any  fixed-baseline  stereo  vision  system  has  limited 
depth  resolution  due  to  the  imaging  geometry,  whereas  a 
system  that  combines  multiple  views  from  many  station¬ 
ary  or  movable  platforms  allows  a  policy  to  take  advan¬ 
tage  of  the  current  context  and  goals  in  selecting  view¬ 
points.  A  “virtual  stereo ”  policy  is  a  policy  that  en¬ 
gages  different  sensor  pairs  as  the  target  moves  through 
ill-conditioned  sensor  geometries.  Although  this  policy  is 
more  flexible  than  a  fixed  pair,  this  approach  requires  dy¬ 
namic  sensor  (re)calibration  and  accuracy  in  the  depth  of 
a  target  is  limited  by  the  quality  of  calibration.  The  vir¬ 


tual  stereo  strategy  may  be  particularly  effective  with  a  pair 
of  mobile  panoramic  sensors  because  they  have  the  poten¬ 
tial  of  always  seeing  each  other  and  estimating  calibration 
parameters [13].  Once  calibrated,  they  can  view  the  en¬ 
vironment  to  estimate  the  3D  information  of  moving  tar¬ 
gets  by  triangulation,  and  maintain  their  calibration  during 
movement  by  tracking  each  other.  If  two  panoramic  vi¬ 
sion  sensors  can  see  each  other  and  at  the  same  time  see 
the  target  motion  cue,  then  they  can  be  used  to  estimate 
the  bearing  and  distance  of  the  target  without  any  off-line 
calibration. 

D  -  P.  sin(/3i2  -  02)  _  sin(q2) 

1  sin(/3i2  -  /32i  +01-02)  sin(a0) 

where  D\  is  the  distance  between  the  target  and  the  first 
camera,  B  is  the  distance  between  the  two  cameras,  9\  and 
02  are  the  bearings  of  the  target  in  image  1  and  image  2  re¬ 
spectively,  and  P12  and  /32i  is  the  image  of  camera  1  in  im¬ 
age  2,  and  camera  2  in  image  1  respectively.  Several  practi- 


Figure  7.  Panoramic  stereo  geometry 


cal  approaches  to  estimate  distance  and  angles  between  two 
panoramic  sensors  have  been  proposed  in  [13].  The  error 
ofDi  can  be  estimated  by  partial  differentials  of  Equation 
1  as 


dD  1 


sin(a2) 

dB  +  B 

sin(a0  +  a2) 

sin(ao) 

sin2(a0) 

da 


(2) 


where  dB  is  the  distance  error,  and  da  is  the  average  an¬ 
gle  error.  The  smaller  the  angle  ao  and/or  B,  the  larger  is 
the  error.  Notice  that  ao  and  B  have  some  inherent  depen¬ 
dency.  Given  the  distance  D 1  and  £>2,  the  change  of  ao 
and  B  are  in  the  same  direction  (increasing  or  decreasing). 

We  have  developed  the  algorithms  for  mutual  calibration 
and  3D  localization  of  motions  using  a  pair  of  panoramic 
vision  systems  each  running  the  saccade-foveate  B-Pgm. 
The  first  implementation  has  been  carried  out  by  cooper¬ 
ation  between  two  stationary  cameras.  Figure  8  shows  a 
stereo  image  pair  from  two  panoramic  sensors. 
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Figure  8.  3D  localization  by  the  panoramic  stereo  system 


2.6  Peripheral  and  Foveal  Vision  Integration 

The  human  eye  has  a  very  wide-angle  low  resolution 
field  in  its  peripheral  view  and  a  very  high  resolution  nar¬ 
row  field  in  its  foveal  view,  a  combination  that  works  coop¬ 
eratively  in  a  highly  robust  manner.  We  can  find  a  moving 
object  within  the  peripheral  field  of  view,  and  then  start 
a  tracking  behavior  by  peripheral-foveal  cooperation.  The 
key  point  here  is  the  natural  cooperation  of  peripheral  and 
foveal  vision  as  a  real-time  behavior  operating  within  a 
common  coordinate  system. 

As  we  consider  a  computer  implementation  of  this  be¬ 
havior,  we  note  differences  with  human  capability.  Humans 
must  rotate  the  head  so  that  the  peripheral  system  covers 
the  moving  object  in  its  field  of  view.  Furthermore,  mul¬ 
tiple  objects  in  very  different  directions  cannot  be  tracked 
simultaneously.  In  our  Track  Human  Containment  Unit, 
the  panoramic-panoramic  sensor  pair  (or  any  other  pair  ap¬ 
plicable  under  the  run-time  context)  can  provide  the  spa¬ 
tial  reference  for  a  saccade-foveate  B-Pgm  on  a  standard 
zoom  camera  mounted  on  a  small  pan/tilt  platform.  The 
pan/tilt/zoom  imaging  system  may  then  undergo  a  saccade 
to  the  interesting  motion  cue.  From  here  it  can  foveate  on 
the  cue  and  zoom  if  necessary  for  detailed  processing. 

High  resolution  color  images  obtained  from  the 
pan/tilt/zoom  camera  can  be  used  to  determine  the  iden¬ 
tity  of  the  object  of  interest.  In  particular,  a  challenging 
problem  is  to  separate  and  track  individuals  in  a  group  (or 
even  a  crowd).  Using  contour  extraction  algorithms  based 
on  motion  cues,  the  pixels  that  correspond  to  the  object  can 
be  extracted  from  the  background. 

Our  general  approach  is  to  apply  suitable  local  image 
operators  to  the  image  to  determine  the  relevant  features 
of  the  object.  Each  known  object  is  represented  as  a  his¬ 
togram  of  these  local  features.  The  histogram  of  the  object 
being  tracked  can  be  matched  with  the  histograms  of  other 
known  objects  from  a  database  in  order  to  recognize  the  ob¬ 
ject.  Object  recognition  is  important  when  there  are  mul¬ 
tiple  objects  being  tracked.  When  the  paths  of  two  mov¬ 
ing  objects  intersect,  an  ambiguity  arises  as  to  whether  the 


paths  did,  indeed,  cross  or  whether  both  objects  turned  back 
upon  meeting.  We  presented  such  a  situation  in  Figures  2 
and  3. 

We  have  successfully  set  up  a  peripheral  and  fovea  vi¬ 
sion  system,  and  implemented  a  cooperative  algorithm  for 
processing  moving  objects.  The  system  detects  any  moving 
object  in  the  view  of  the  panoramic  camera,  and  tracks  and 
identifies  it  through  the  zoom  camera.  If  there  are  multi¬ 
ple  motion  trackers  orchestrated  in  the  Human  Tracker  CU 
and  multiple  pan-tilt  zoom  cameras  in  a  distributed  sensor 
network  of  stationary  and  moving  platforms,  the  function¬ 
ality  of  the  system  should  respond  gracefully  in  the  face  of 
hardware  and  algorithm  failures  by  deploying  applicable 
subsets  of  sensors. 

Figure  9  illustrates  the  image  resulting  from  such  a  pro¬ 
cess  where  the  spatial  reference  to  a  motion  cue  is  provided 
by  the  panoramic-panoramic  image  pair  presented  earlier 
in  Figure  8.  The  suspicious  character  in  this  panoramic 
image  pair  has  been  scrutinized  successfully  using  the 
pan/tilt/zoom  camera. 


Figure  9.  A  close  up  (zoom)  image  of  the  Human 
Subject  localized  using  a  panoramic-panoramic  sen¬ 
sor  pair. 
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3  The  Containment  Unit 


B-Pgms  can  be  used  to  coordinate  the  behavior  of  a  fixed 
set  of  resources.  In  [7],  we  show  how  to  build  policies  au¬ 
tomatically  using  reinforcement  learning  that  approach  op¬ 
timal  policies  for  a  fixed  resource  allocation.  The  Contain¬ 
ment  Unit  (CU)  is  an  active  entity  designed  to  represent  a 
family  of  optimal  contingency  plans  parameterized  by  re¬ 
source  commitments.  Its  objective  is  to  “contain”  faults.  A 
fault  is  generally  construed  to  be  any  functional  violation  of 
the  specified  behavior  associated  with  the  containment  unit: 
real-time  constraints,  liveness  of  constituent  hardware,  or 
performance  constraints.  If  a  sensor  fails,  it  is  the  role  of 
the  containment  unit  to  select  an  alternative  behavioral  pro¬ 
gram  to  provide  the  same  type  of  information  and  to  inform 
the  process  that  activated  the  CU  of  the  impact  on  the  ex¬ 
pected  performance.  Containment  units,  therefore,  manage 
a  set  of  parametric  B-Pgms  given  resource  specifications 
and  report  the  property  associated  with  the  CU  and  the  ex¬ 
pected  quality  of  the  result.  The  CU  monitors  fault  condi¬ 
tions  and  responds  autonomously  to  produce  the  informa¬ 
tion  requested  by  a  higher-level  CU. 
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Figure  10.  The  Structure  of  a  Containment  Unit. 

The  structure  of  a  CU  is  presented  schematically  in  Fig¬ 
ure  10.  Multiple  instances  of  a  CU  may  be  active  concur¬ 
rently,  each  with  a  resource  specification  that  determines 
the  range  of  variation  permitted  locally  in  the  strategy  for 
executing  the  CU  directive.  Global  resource  constraints 
are  achieved  by  limiting  the  range  of  autonomy  each  CU 
enjoys  through  careful  specification  of  its  proprietary  re¬ 
sources.  The  set  of  alternative  B-Pgms  available  to  the  CU 
represents  all  possible  coordinated  sensory  and  motor  poli¬ 
cies  for  achieving  the  objective  with  systems  resources.  In 
general,  these  policies  may  be  applicable  only  in  prescribed 
contexts.  For  example,  adequate  illumination  may  be  nec¬ 
essary  to  employ  those  B-Pgms  with  vision  sensors,  or  lim¬ 


ited  target  velocity  may  be  required  in  order  to  track  with 
a  scanning  pyroelectric  sensor.  These  “contexts”  can  be 
loaded  when  a  CU  is  activated  and  then  verified  at  run¬ 
time,  or  they  may  be  recovered  by  monitoring  the  active 
B-Pgm’s  performance.  An  inappropriate  run-time  context 
can  be  used  to  reconfigure  the  CU  locally  and/or  passed 
upward  to  the  process  that  activated  the  CU. 

3.1  CU  Supervisor:  Domain-Independent  Be¬ 
havioral  Expertise 

Some  aspects  of  a  particular  B-Pgm’s  performance  in 
situ  are  determined  entirely  by  attributes  of  the  participat¬ 
ing  resources.  The  most  obvious  example  of  critical  local 
state  is  the  liveness  of  the  participating  hardware.  Other 
locally  determined  attributes  can  also  be  important  with  re¬ 
spect  to  overall  performance.  Consider  a  pair  of  vision  sen¬ 
sors  performing  as  a  virtual  stereo  pair  to  localize  a  moving 
target.  Localization  will  be  poor  if  the  uncertainty  in  the 
position  of  the  participating  sensors  is  large  or  the  saccade- 
foveate  B-Pgm  may  behave  poorly  if  the  target  approaches 
a  collinear  spatial  relationship  with  the  sensor  pair.  These 
conditions  are  entirely  determined  by  examining  attributes 
of  the  sensors  (their  relative  spatial  arrangement)  and  the 
result  of  the  B-Pgm  coordinating  them  (the  target  position). 

Circumstances  such  as  these  are  completely  determined 
in  the  local  state  of  the  CU  and  should  be  handled  locally 
without  higher-level  deliberation.  The  CU  depicted  in  Fig¬ 
ure  10  contains  a  local  supervisor  that  accomplishes  this 
objective.  Some  of  the  policies  engaged  by  the  supervisor 
can  be  hand-coded  based  on  knowledge  regarding  the  in¬ 
teractions  between  resources  and/or  known  deficiencies  in 
software  processes  used  to  respond  to  feedback  from  the 
world.  We  will  develop  an  example  of  the  CU  supervisor 
in  Section  5. 

3.2  Context:  Domain-Dependent  Behavioral 
Models 

Open  environments  present  data  sets  to  sensorimotor 
processes  that  cannot  be  predicted  at  process  configuration 
time  in  general  and  must  be  observed  at  run-time.  When 
peculiar  or  unexpected  environments  cause  the  behavior  of 
the  system  to  deviate  from  expectations,  a  higher-level  re¬ 
configuration  must  modify  system  performance  while  re¬ 
maining  within  specifications.  If  a  specific  B-Pgm  proves 
to  be  inadequate  in  a  particular  run-time  context,  the  con¬ 
text  is  passed  upward  in  the  control  hierarchy  to  a  process 
manager  which  may  choose  to  reinstantiate  the  CU  with  a 
different  resource  specification.  Over  time,  some  of  these 
reconfiguration  decisions  that  depend  strongly  on  control¬ 
lable  system  components  might  be  compiled  into  appropri¬ 
ate  CU  supervisors.  However,  other  contexts  will  be  de¬ 
termined  by  the  run-time  environment,  and  the  deliberative 
process  planner  must  model  these  dependencies  at  a  higher 
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level.  We  are  studying  mechanisms  where  the  process  de¬ 
scription  can  incrementally  model  these  environmentally 
determined  contexts  and  manage  resources  so  as  to  recover 
critical  run-time,  environmentally  determined  contexts  in 
the  course  of  the  mission. 

In  self-adaptive  software  systems,  an  additional  dimen¬ 
sion  of  complexity  in  decision-making  and  predictability 
is  introduced,  namely,  adaptive  changes  must  be  accom¬ 
plished  with  some  assurance  of  correctness  and  safety.  Ex¬ 
haustive  formal  methods,  or  detailed  re-planning  are  often 
too  time  consuming  to  be  executed  completely  at  run-time. 
This  system  exploits  a  dynamic  composability  approach, 
wherein  possible  execution  models  are  pre-analysed  at  de¬ 
sign  time  to  determine  which  compositions  could  poten¬ 
tially  execute  successfully.  The  approach  also  generates  an 
expectation  of  the  successful  completion  of  these  compo¬ 
sitions  under  different  anticipated  contingencies  (including 
hardware  faults,  algorithm  failures,  and  deviations  from  ex¬ 
pectations).  This  pre-analysis  will  greatly  reduce  the  over¬ 
head  of  dynamic  decision  making,  by  ruling  out  alterna¬ 
tives  unlikely  to  be  fruitful  and  by  guiding  the  search  pro¬ 
cess  at  run-time  with  this  composability  information.  With 
this,  many  of  the  estimates  can  be  tightened  and  more  ef¬ 
ficient  plans  can  be  generated.  Thus,  when  a  fault  is  re¬ 
ported,  the  Containment  Unit  executes  a  search  over  the 
repository  of  available  B-Pgms  and  chooses  the  one  whose 
pre-analized  composability  information  is  most  appropriate 
for  the  task,  state  of  the  system,  and  state  of  the  world. 

The  memory  structure  illustrated  in  Figure  10  records 
the  reported  results  of  all  participating  resources,  estimates 
the  state  information  required  by  the  local  supervisor,  and 
supports  interpretation  and  reporting  actions  generated  by 
the  CU.  Task  specific  information  such  as  target  loca¬ 
tion  and  current  fault  conditions  are  stored.  The  struc¬ 
ture  is  maintained  by  a  communication  protocol  over  in¬ 
ternet  sockets  between  the  active  B-Pgms  and  the  CU.  If 
resources  reside  on  disparate  architectures  and  operating 
systems,  the  memory  structure  will  also  provide  the  CU 
with  a  common  communication  interface  to  all  subsystems. 
The  memory  structure  is  also  available  to  any  process  run¬ 
ning  on  the  host  computer  and  forms  the  basis  for  the  High 
Level  Interface. 

4  The  Little- JIL  Agent  Coordination  Lan¬ 
guage 

Little-JIL  [LJIL-ICSE]  provides  rich  and  rigorous  se¬ 
mantics  for  the  precise  specification  of  processes  that  coor¬ 
dinate  multiple  agents  [11,  12],  In  the  context  of  SAFER, 
the  agents  consist  of  individual  sensors,  individual  robots, 
or  combinations  of  these.  Little-JIL  provides  constructs  for 
proactive  and  reactive  control,  exception  handling  and  re¬ 
source  management. 


A  Little-JIL  process  defines  a  high-level  plan  to  coor¬ 
dinate  agents  to  act  as  a  loose  team.  A  process  is  con¬ 
structed  of  steps  that  are  hierarchically  decomposed  into 
finer-grained  substeps.  The  steps  and  substeps  are  con¬ 
nected  with  dataflow  and  control  flow  edges.  Each  step 
may  have  a  resource  declaration  identifying  the  resources 
needed  to  carry  out  that  step.  These  resources  include 
the  sensors  and  robots  but  may  also  include  computational 
platforms  and  communication  hardware  to  allow  reasoning 
over  the  sharing  of  these  resources  among  computationally 
and  communicationally  expensive  algorithms. 

A  process  typically  specifies  parts  of  the  coordina¬ 
tion  quite  precisely  while  leaving  some  opportunities  for 
choices  to  be  deferred  until  runtime.  For  example,  pre¬ 
cise  resource  allocation  decisions  are  typically  deferred  to 
runtime  (or  a  pre -runtime  analysis  stage).  In  this  way  a 
step  may  be  implemented  in  one  of  several  ways,  each  of 
which  uses  a  different  collection  of  resources.  The  selec¬ 
tion  of  which  choice  is  most  appropriate  may  depend  upon 
which  resources  are  available  at  that  time,  how  quickly  we 
must  perform  the  computation,  how  precise  a  result  we 
must  get,  and  the  physical  environment  at  the  time  of  ex¬ 
ecution.  These  high-level  decisions  that  require  reasoning 
across  the  collection  of  loosely-coupled  robots  and  sensors 
are  the  types  of  decisions  made  within  the  process. 

The  process  also  contains  a  reactive  element.  This  is 
particularly  useful  for  exception  handling.  For  example,  a 
certain  amount  of  reaction  can  be  handled  within  the  con¬ 
tainment  units  by  dynamically  selecting  the  appropriate  B- 
programs.  Some  situations,  however,  require  higher  level 
support.  A  simple  example  is  that  of  a  timeout.  We  may 
want  to  instantiate  a  particular  containment  unit  for  a  lim¬ 
ited  amount  of  time.  To  do  this  we  inform  the  contain¬ 
ment  unit  of  a  timeout.  When  the  timeout  occurs,  the  pro¬ 
cess  reacts  by  choosing  another  activity  based  upon  the  re¬ 
sults  seen  thus  far.  Another  example  occurs  with  a  process 
intended  to  track  multiple  people.  With  such  a  process, 
we  might  want  to  always  have  one  sensor  responsible  for 
watching  for  new  motion  entering  at  a  door,  while  allow¬ 
ing  the  remaining  resources  to  track  targets  already  in  the 
room.  If  a  new  motion  enters,  the  process  reacts  by  reas¬ 
signing  resources.  The  actual  selection  of  resources  and 
containment  units  and  thus  the  actual  instantiation  of  the 
system  is  made  by  the  integrated  capability  of  robot  plan¬ 
ning  and  scheduling  technologies  whose  description  is  out¬ 
side  the  scope  of  this  paper. 

The  Little-JIL  process  control  language  as  discussed 
above,  provides  a  powerful  means  of  exploiting  knowledge 
to  structure  planning  and  learning  by  focusing  policy  for¬ 
mation  on  a  small  set  of  legal  programs.  Moreover,  at  lower 
levels,  new  and  enhanced  processes  are  constructed.  The 
objective  is  to  constantly  optimize  and  generalize  the  ac¬ 
tive  B-Pgm  during  training  tasks,  and  to  return  it  at  the  end 


Figure  11.  Sample  Little-JIL  Process  Description  for  Tracking  a  Human  Subject. 


of  the  task  better  than  we  found  it.  These  B-Pgms  actually 
consist  of  many  coordinated  primitive  controllers  but  are 
thought  of  as  discrete  abstract  actions.  Subsequent  plans 
and  learning  processes  can  exploit  this  abstraction. 

Figure  1 1  shows  a  sample  Little-JIL  process  that  uses 
sensors  to  track  multiple  humans.  We  assume  that  this  pro¬ 
cess  specification  is  in  the  context  of  a  partial  model  of  the 
run-time  environment.  The  root  step  of  the  process  is  Track 
Humans.  This  step  is  decomposed  into  two  steps  that  run 
concurrently  (denoted  by  the  blue  parallel  lines).  One  step 
is  to  track  an  individual  human  while  the  other  step  is  to 
watch  the  door.  The  Watch  Door  step  requires  use  of  the 
panoramic  camera. 

Track  Human  is  a  choice  step.  Dynamically,  the  system 
will  decide  which  of  the  three  substeps  to  use.  This  deci¬ 
sion  will  be  based  upon  which  resources  are  available,  what 
time  constraints  there  are  on  the  tracking,  and  contextual 
issues,  such  as  whether  there  is  good  lighting  or  whether 
the  target  is  moving  quickly.  One  might  easily  imagine 
many  more  than  three  choices  here.  Each  choice  requires 
one  or  more  resources  and  has  some  expected  performance. 
The  scheduler  and  runtime  system  use  knowledge  about  the 
context  to  assist  in  making  the  decision. 

If  another  human  enters  the  room,  this  results  in  an  event 
that  is  handled  by  the  second  Track  Human  step.  This  is 
simply  a  reference  to  the  original  track  human  step  and  will 
result  in  a  new  instance  of  Track  Human  starting  with  a 
new  set  of  resources.  Of  course,  when  a  new  human  en¬ 
ters  the  room,  it  could  be  that  the  existing  resources  are  all 
being  used  to  track  the  people  that  are  in  the  room.  This 
would  result  in  an  exception  causing  some  replanning  and 
reallocation  of  resources  to  occur.  Other  exceptions  can  be 
used  to  adapt  locally  (within  the  CU)  during  execution.  For 


example,  if  there  had  been  normal  lighting  and  the  lights 
were  turned  off,  we  would  expect  an  exception  within  the 
currently  active  containment  units  that  employ  vision  sen¬ 
sors. 

5  SAFER  Experimental  Platform 

In  our  experimental  platform,  we  have  implemented 
three  types  of  motion  detectors  that  are  deployed  at  fixed 
and  known  positions  in  an  indoor  office-like  environment. 
The  platform  consists  of  an  articulated  stereo  vision  sys¬ 
tem,  and  scanning  pyroelectric  sensor,  and  two  panoramic 
vision  sensors.  In  each  instance  of  the  saccade-foveate  B- 
Pgm  observations  are  collected  from  sensor  pairs  that  are 
sufficient  to  determine  a  spatial  location  of  the  moving  fea¬ 
ture  in  the  field  of  view.  This  family  of  functionally  equiv¬ 
alent  programs  produces  a  spatial  estimate  of  a  motion  cue 
with  varying  quality  that  could  serve  as  a  spatial  position 
reference  to  a  subsequent  sensory  or  motor  control  task. 
Indeed,  combinations  of  these  strategies  are  themselves  B- 
Pgms  with  reserved  resources  for  corroboration  or  for  fault 
tolerance.  Which  of  these  to  use  in  a  particular  context  is 
dependent  on  the  task,  the  resources  available,  and  the  ex¬ 
pected  performance  based  on  accumulated  experience. 

5.1  Designing  the  CU  Supervisor  for  Tracking 
Human  Subjects 

The  CU  Supervisor  determines  which  B-Pgm  (sensor 
pair)  is  recommended  for  triangulation  and  tracking  give 
the  current  state  of  the  process.  In  our  demonstration,  there 
are  six  unique  pairs  of  sensors  available.  A  state  predicate 
describes  the  “liveness”  of  each  pair.  For  a  given  pair,  if 
both  sensors  are  functioning  and  they  are  not  in  a  collinear 
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Motion  Tracking  Sensors: 

•  1  Pyroelectric  Sensor; 

•  1  Stereo  Head  Sensor; 

•  2  Panoramic  Vision  Sensors. 


Figure  12.  The  “Smart  Room”  -  Motion  Tracking  Platform. 


configuration,  the  corresponding  predicate  is  set  to  1,  oth¬ 
erwise  it  is  set  to  0.  This  is  the  role  of  the  state  estimation 
component  of  Figure  10.  Given  a  pattern  in  the  state  vector 
defining  available  sensor  pairs,  the  CU  supervisor  always 
chooses  the  pair  of  sensors  with  the  highest  value  with  re¬ 
spect  to  the  process’  objective  function. 

We  have  hand-crafted  a  Human  Tracking  CU  supervi¬ 
sor  for  engaging  sensor  pairs  that  deploys  resources  in  the 
following  priority-based  hierarchy: 

•  Panoramic-stereo  head  (camera  1); 

•  Panoramic-stereo  head  (camera  2); 

•  Stereo-head  (camera  1  and  2); 

•  Panoramic -pyroelectric; 

•  Stereo-head  (camera  l)-pyroelectric; 

•  Stereo-head  (camera  2)-pyroelectric. 

Each  resource  allocation  in  this  hierarchy,  in  turn,  instanti¬ 
ates  two  concurrent  containment  units  for  tracking  motion 
with  a  single  sensor.  These  subordinate  CUs  execute  the 
saccade-foveate  B-Pgm  described  earlier  and  report  to  the 
track  human  CU.  Each  CU  in  this  hierarchical  control  pro¬ 
cess  has  the  authority  to  manage  the  resources  reserved  for 
them. 

5.2  Experimental  results 

The  Human  Tracking  CU  supervisor  has  been  imple¬ 
mented  to  control  the  various  sensors  in  order  to  track  a  sin¬ 


gle  moving  person  seamlessly  through  failure  modes  cap¬ 
tured  in  the  liveness  assertion.  Some  preliminary  results 
are  presented  below. 

5.2.1  Accuracy  and  Repeatability  Experiments. 

To  design  any  CU  supervisor  that  depends  on  the  coordi¬ 
nated  activity  of  multiple  sensors,  it  is  necessary  to  model 
the  performance  of  the  individual  sensors.  We  conducted 
a  series  of  experiments  to  determine  the  accuracy  and  re¬ 
peatability  of  the  sensors.  At  known  spatial  locations,  a 
motion  cue  was  generated  and  observed  from  the  different 
sensors.  It  was  observed  that  the  panoramic  sensors  were 
both  accurate  and  repeatable,  the  stereo  head  is  accurate  but 
not  repeatable,  and  the  pyroelectric  sensor  was  repeatable 
but  not  accurate.  The  data  was  also  used  to  examine  the 
quality  of  triangulation  on  the  motion  cue  by  different  sen¬ 
sor  pairs.  As  expected  the  quality  degraded  as  the  motion 
cue  approached  the  line  joining  a  sensor  pair  or  a  collinear 
configuration.  Because  such  a  configuration  is  not  desir¬ 
able  we  call  this  a  collinear  fault.  Conversely,  the  quality  is 
best  when  motion  cue  is  along  a  direction  orthogonal  to  the 
line  joining  a  sensor  pair. 

5.2.2  Tracking  a  Human  Subject. 

The  next  experiment  evaluated  the  complete  task  of 
tracking  a  single  moving  person  using  combinations  of  the 
four  sensors.  The  results  are  shown  in  Figures  13,  14, 
15  and  16.  Figure  13  shows  the  tracks  of  Panoramic- 


10 


Legend:  (axis  in  cm) 

O  Sensors 

_  Collinear  lines 

is _ is  Human  Path 

(right  to  left  at  1 5  cm/s) 

Motion  Tracking  B-pgms: 

- [> Tyi  -  Pyroelectric  and  Stereo  Head  Tracking 

- □ Tay  -  Panoramic  and  Pyroelectric  Tracking 

Figure  13.  Motion  Tracking  for  the  Pyroelectric- 
Stereo  head  and  Pyroelectric -Panoramic  sensor  pairs 
in  the  “Smart  Room.”. 


Legend:  (axis  in  cm) 

O  Sensors 

_  Collinear  lines 

Is  [s  Human  Path 

^  ^  (right  to  left  at  1 5  cm/s) 

Motion  Tracking  B-pgms: 

- [> -  Tai  -  Panoramic  and  Stereo  Head  Tracking 

- □ -  Tii  -  Stereo  Head  Tracking 

Figure  14.  Motion  Tracking  for  the  Panoramic-Stereo 
head  and  Stereo  Head  sensor  pairs  in  the  “Smart 
Room.”. 


Pyroelectric  pair  (Tay)  and  Pyroelectric-Stereo  head  pair 
(Tyi).  As  the  motion  track  crosses  collinear  sensor  geome¬ 
tries,  the  performance  degrades  as  expected. 

Figure  14  shows  the  tracks  of  Panoramic -Stereo  head 
pair  (Tai)  and  Stereo  head  alone  (To).  Target  tracking  us¬ 
ing  stereo  head  alone  can  be  quite  bad  due  to  the  small 
stereo  baseline  and  the  mechanical  properties  of  the  Stereo 
Head  platform  [2] . 

Figure  15  shows  the  localization  results  using  the 
Panoramic  virtual  stereo  pair  ( Tap )  produces  very  good  re¬ 
sults  for  large  regions  of  the  room.  This  sensor  pair  is  there¬ 
fore  highly  reliable,  leading  to  its  priority  in  the  CU  super¬ 
visor  for  the  task.  When  these  resources  are  available,  they 
are  well-advised  both  for  tracking  precision  and  because  of 
the  complete  field  of  view  they  provide. 

The  last  example  show  the  performance  of  the  CU  su¬ 
pervisor  which  effects  software  mode  changes  in  response 
to  liveness  feedback  from  the  sensors.  This  feedback  ad¬ 
dresses  both  hardware  function  and  the  collinearity  fault. 
Figure  16  shows  that  the  Track  Human  CU  supervisor  was 
effective  in  handling  these  run-time  contexts.  The  observed 
track  was  very  to  the  true  trajectory  of  the  moving  test  sub¬ 
ject. 

The  results  demonstrate  that  the  hierarchical  architec¬ 
ture  is  capable  of  handling  faults  at  both  lower  level  (i.e. 
sensors)  and  higher  level  (i.e.  context  of  the  motion  cue). 


6  Summary,  Conclusions,  and  F  uture  Exper¬ 
imental  Work 

Multi-robot  scenarios  present  significant  technical  chal¬ 
lenges  regarding  sensing,  planning,  computing,  and  soft¬ 
ware  methods  and  must  support  both  reactivity  and  pre¬ 
dictability.  Ultimately,  one  of  the  most  desirable  charac¬ 
teristics  of  a  multi-robot  system  is  its  ability  to  adapt  to 
changes  in  the  environment  and  to  internal  faults  -  in  hard¬ 
ware  components  and  in  end-to-end  performance  specifica¬ 
tions.  Thus,  reconfigurability  is  critical. 

Our  current  work  presents  some  preliminary  results  to¬ 
wards  the  responsiveness  to  novel  data  sets,  adaptability 
and  robustness  that  are  critical  to  a  multi-robot  application. 
The  CU  supervisor  that  was  assigned  the  task  of  tracking 
a  human  was  able  to  handle  individual  sensor  faults  (low- 
level)  as  well  as  faults  due  to  context  of  the  motion-cue 
(high-level)  and  seamlessly  track  the  human.  In  conclu¬ 
sion,  our  vertically  integrated  software  environment  can  re¬ 
configure  resources  dynamically  depending  on  a  variety  of 
failures,  making  the  system  robust. 

6.1  Doorway  Abstraction 

In  a  real  situation,  the  agents  will  have  to  cooperate  with 
each  other  and  pool  their  sensory  information  and  knowl¬ 
edge  to  build  a  reliable  model  of  their  environment.  To 
demonstrate  one  such  situation,  we  plan  to  give  the  agents 
the  task  of  doorway  abstraction.  The  doorway  is  an  inter¬ 
esting  thing  to  learn  and  incorporate  it  into  the  model  be¬ 
cause  that  is  were  motion  cues  of  interest  originate  often. 
A  panoramic  sensor  could  be  useful  in  this  task  because  its 
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CU  Supervisor  -  Composition  of  Tyi,  Tay,  Tai, 

and  Tii  based  on  collinearity  and 
liveness  faults 


Figure  15.  Motion  Tracking  for  the  Panoramic - 
Panoramic  sensor  pair  in  the  “Smart  Room.”. 


Figure  16.  Motion  Tracking  Performance  during 
Mode  Changes  in  the  Motion  Tracking  CU  supervi¬ 
sor.. 


field  of  view  is  360  degrees  unlike  other  sensors  which  has 
to  saccade  to  a  region  of  interest.  The  panoramic  sensor  can 
maintain  a  certainty  value  for  different  regions  in  its  field  of 
view.  The  value  indicates  the  certainty  of  a  doorway  being 
in  that  region.  Whenever  a  motion  cue  appears  or  disap¬ 
pears  in  a  region,  its  certainty  value  is  increased.  This  way 
after  certain  period  of  time,  a  reliable  model  about  doorway 
could  be  built.  Of  course,  some  of  these  regions  need  not  be 
doorways  but  just  simple  occlusions.  This  can  be  handled 
by  using  two  panoramic  sensors,  one  of  which  is  moving 
and  can  position  itself  from  where  it  can  corroborate  the 
presence  or  absence  of  doorway. 

6.2  Multiple  Target  Corroboration 

When  there  are  multiple  targets,  the  triangulation  is  no 
more  trivial.  The  different  types  of  sensors  come  to  our  aid 
is  this  situation.  Like  for  example  if  panoramic  sensor  is 
tracking  two  motion  cues  -  say  a  person  and  robot,  using  the 
pyroelectric  sensor  we  can  know  that  one  of  them  is  a  per¬ 
son  and  thus  selectively  track  that  motion  of  more  interest 
to  us.  Another  scenario  is  when  two  motions  cues  intersect. 
This  problem  is  discussed  earlier  in  Section  2.1.  When  the 
scenario  involves  multiple  humans,  it  is  more  challenging. 
In  this  case,  we  plan  to  use  a  monocular  camera  to  zoom 
in  on  regions  of  interest  as  given  by  the  panoramic  sen¬ 
sor,  as  shown  in  Section  2.6.  This  helps  us  to  capture  and 
record  signatures  of  these  motion  cues  like  color,  shape  etc. 
The  higher  levels  can  reason  over  these  and  help  to  decide 
which  targets  to  triangulate  on. 
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